Principal Methods

5

ABC-Net [147] is another network designed to improve the performance of binary net-

works. ABC-Net approximates the full precision weight filter W with a linear combination

of M binary filters B1, B2, ..., BM ∈{+1,1} such that Wα1β1 + ... + αMβM. These

binary filters are fixed as follows:

Bi = Fui(W) := sign( ¯W + uistd(W)), i = 1, 2, ..., M,

(1.11)

where ¯W and std(W) are the mean and standard derivation of W, respectively. For acti-

vations, ABC-Net employs multiple binary activations to alleviate information loss. Like

the binarization weights, the real activation I is estimated using a linear combination of N

activations A1, A2, ..., AN such that I = β1A1 + ... + βNAN, where

A1, A2, ..., AN = Hv1(R), Hv2(R), ..., HvN (R).

(1.12)

H(R) in Eq. 4.35 is a binary function, h is a bounded activation function, I is the

indicator function, and v is a shift parameter. Unlike the input weights, the parameters

β and v are trainable. Without explicit linear regression, the network tunes β

ns and v

ns

during training and is fixed for testing. They are expected to learn and utilize the statistical

features of full-precision activations.

Ternary-Binary Network (TBN) [228] is a CNN with ternary inputs and binary weights.

Based on accelerated ternary-binary matrix multiplication, TBN uses efficient operations

such as XOR, AND, and bit count in standard CNNs, and thus provides an optimal trade-

offbetween memory, efficiency, and performance. Wang et al. [233] propose a simple yet

effective two-step quantization framework (TSQ) by decomposing network quantization into

two steps: code learning and transformation function learning based on codes learned. TSQ

fits primarily into the class of 2-bit neural networks.

Local Binary Convolutional Network (LBCNN) [109] proposes a local binary convolution

(LBC), which is motivated by local binary patterns (LBP), a descriptor of images rooted

in the face recognition community. The LBC layer has a set of fixed, sparse predefined

binary convolutional filters that are not updated during the training process, a non-linear

activation function, and a set of learnable linear weights. The linear weights combine the

activated filter responses to approximate a standard convolutional layer’s corresponding

activated filter responses. The LBC layer often affords significant parameter savings of 9x

to 169x fewer learnable parameters than a standard convolutional layer. Furthermore, the

sparse and binary nature of the weights also results in up to 169x savings in model size

compared to a conventional convolution.

Modulated Convolutional Networks (MCN) [236] first introduce modulation filters (M-

Filters) to recover the binarized filters. M-Filters are designed to approximate unbinarized

convolutional filters in an end-to-end framework. Each layer shares only one M-Filter, lead-

ing to a significant reduction in model size. To reconstruct the unbinarized filters, they

introduce a modulated process based on the M-Filters and binarized filters. Figure 1.1 is an

example of the modulation process. In this example, the M-Filter has four planes, each of

which can be expanded to a 3D matrix according to the channels of the binarized filter. After

theoperation between the binarized filter and each expanded M-Filter, the reconstructed

filter Q is obtained.

As shown in Fig. 1.2, the reconstructed filters Q are used to calculate the output feature

maps F. There are four planes in Fig. 1.2, so the number of channels in the feature maps

is also 4. Using MCNs convolution, every feature map’s input and output channels are the

same, allowing the module to be replicated and the MCNs to be easily implemented.

Unlike previous work in which the model binarizes each filter independently, Bulat et al.

[23] propose parameterizing each layer’s weight tensor using a matrix or tensor decomposi-

tion. The binarization process uses latent parametrization through a quantization function